长期以来,Robotics一直是一个遍布复杂系统体系结构的领域,无论传统或基于学习的模块和联系都需要大量的人类专业知识和先验知识。受大型预训练语言模型的启发,这项工作引入了预先培训的通用表示范式,该范式可以作为给定机器人多个任务的起点。我们提出了感知性因果变压器(PACT),这是一种基于生成变压器的架构,旨在以自我监督的方式直接从机器人数据构建表示形式。通过对状态和行动的自动回归预测,我们的模型隐含地编码了特定机器人的动态和行为。我们的实验评估重点是移动药物的域,我们表明该机器人特定的表示可以作为单个起点,以实现不同的任务,例如安全导航,定位和映射。我们评估了两个形式:使用激光雷达传感器作为感知输入(MUSHR)的轮式机器人,以及使用第一人称RGB图像(栖息地)的模拟药物。我们表明,与训练单个模型的同时训练单个模型相比,对所有任务的单个模型进行训练,并且与独立培训单独的大型模型相当的性能,对每个任务的单个模型进行了可比的训练,则在较大的审计模型上进行了固定小型任务特异性网络,从而使性能明显提高。通过跨任务共享共同的优质表示,我们可以降低整体模型容量并加快此类系统的实时部署。
translated by 谷歌翻译
模拟逼真的传感器是自主系统数据生成的挑战,通常涉及精心手工的传感器设计,场景属性和物理建模。为了减轻这一点,我们引入了一条管道,用于对逼真的激光雷达传感器进行数据驱动的模拟。我们提出了一个模型,该模型可以在RGB图像和相应的LIDAR功能(例如Raydrop或每点强度)之间直接从真实数据集中进行映射。我们表明,我们的模型可以学会编码逼真的效果,例如透明表面上的掉落点或反射材料上的高强度回报。当应用于现成的模拟器软件提供的天真播放点云时,我们的模型通过根据场景的外观预测强度和删除点来增强数据,以匹配真实的激光雷达传感器。我们使用我们的技术来学习两个不同的LIDAR传感器的模型,并使用它们相应地改善模拟的LiDAR数据。通过车辆细分的示例任务,我们表明通过我们的技术增强模拟点云可以改善下游任务性能。
translated by 谷歌翻译
缺失或缺乏输入功能,是许多模型调试工具的基础概念。但是,在计算机视觉中,不能简单地从图像中删除像素。因此,一种倾向于诉诸启发式方法,例如涂黑像素,这反过来又可能引入调试过程中的偏见。我们研究了这样的偏见,特别是展示了基于变压器的架构如何使遗失性更自然地实施,哪些侧架来侧翼这些问题并提高了实践中模型调试的可靠性。我们的代码可从https://github.com/madrylab/missingness获得
translated by 谷歌翻译
Recent advances in neural radiance fields have enabled the high-fidelity 3D reconstruction of complex scenes for novel view synthesis. However, it remains underexplored how the appearance of such representations can be efficiently edited while maintaining photorealism. In this work, we present PaletteNeRF, a novel method for photorealistic appearance editing of neural radiance fields (NeRF) based on 3D color decomposition. Our method decomposes the appearance of each 3D point into a linear combination of palette-based bases (i.e., 3D segmentations defined by a group of NeRF-type functions) that are shared across the scene. While our palette-based bases are view-independent, we also predict a view-dependent function to capture the color residual (e.g., specular shading). During training, we jointly optimize the basis functions and the color palettes, and we also introduce novel regularizers to encourage the spatial coherence of the decomposition. Our method allows users to efficiently edit the appearance of the 3D scene by modifying the color palettes. We also extend our framework with compressed semantic features for semantic-aware appearance editing. We demonstrate that our technique is superior to baseline methods both quantitatively and qualitatively for appearance editing of complex real-world scenes.
translated by 谷歌翻译
The rapid growth of machine translation (MT) systems has necessitated comprehensive studies to meta-evaluate evaluation metrics being used, which enables a better selection of metrics that best reflect MT quality. Unfortunately, most of the research focuses on high-resource languages, mainly English, the observations for which may not always apply to other languages. Indian languages, having over a billion speakers, are linguistically different from English, and to date, there has not been a systematic study of evaluating MT systems from English into Indian languages. In this paper, we fill this gap by creating an MQM dataset consisting of 7000 fine-grained annotations, spanning 5 Indian languages and 7 MT systems, and use it to establish correlations between annotator scores and scores obtained using existing automatic metrics. Our results show that pre-trained metrics, such as COMET, have the highest correlations with annotator scores. Additionally, we find that the metrics do not adequately capture fluency-based errors in Indian languages, and there is a need to develop metrics focused on Indian languages. We hope that our dataset and analysis will help promote further research in this area.
translated by 谷歌翻译
Flooding is one of the most disastrous natural hazards, responsible for substantial economic losses. A predictive model for flood-induced financial damages is useful for many applications such as climate change adaptation planning and insurance underwriting. This research assesses the predictive capability of regressors constructed on the National Flood Insurance Program (NFIP) dataset using neural networks (Conditional Generative Adversarial Networks), decision trees (Extreme Gradient Boosting), and kernel-based regressors (Gaussian Process). The assessment highlights the most informative predictors for regression. The distribution for claims amount inference is modeled with a Burr distribution permitting the introduction of a bias correction scheme and increasing the regressor's predictive capability. Aiming to study the interaction with physical variables, we incorporate Daymet rainfall estimation to NFIP as an additional predictor. A study on the coastal counties in the eight US South-West states resulted in an $R^2=0.807$. Further analysis of 11 counties with a significant number of claims in the NFIP dataset reveals that Extreme Gradient Boosting provides the best results, that bias correction significantly improves the similarity with the reference distribution, and that the rainfall predictor strengthens the regressor performance.
translated by 谷歌翻译
Recent advancements in sensing and communication facilitate obtaining high-frequency real-time data from various physical systems like power networks, climate systems, biological networks, etc. However, since the data are recorded by physical sensors, it is natural that the obtained data is corrupted by measurement noise. In this paper, we present a novel algorithm for online real-time learning of dynamical systems from noisy time-series data, which employs the Robust Koopman operator framework to mitigate the effect of measurement noise. The proposed algorithm has three main advantages: a) it allows for online real-time monitoring of a dynamical system; b) it obtains a linear representation of the underlying dynamical system, thus enabling the user to use linear systems theory for analysis and control of the system; c) it is computationally fast and less intensive than the popular Extended Dynamic Mode Decomposition (EDMD) algorithm. We illustrate the efficiency of the proposed algorithm by applying it to identify the Van der Pol oscillator, the IEEE 68 bus system, and a ring network of Van der Pol oscillators.
translated by 谷歌翻译
Knowledge about outcomes is critical for complex event understanding but is hard to acquire. We show that by pre-identifying a participant in a complex event, crowd workers are able to (1) infer the collective impact of salient events that make up the situation, (2) annotate the volitional engagement of participants in causing the situation, and (3) ground the outcome of the situation in state changes of the participants. By creating a multi-step interface and a careful quality control strategy, we collect a high quality annotated dataset of 8K short newswire narratives and ROCStories with high inter-annotator agreement (0.74-0.96 weighted Fleiss Kappa). Our dataset, POQue (Participant Outcome Questions), enables the exploration and development of models that address multiple aspects of semantic understanding. Experimentally, we show that current language models lag behind human performance in subtle ways through our task formulations that target abstract and specific comprehension of a complex event, its outcome, and a participant's influence over the event culmination.
translated by 谷歌翻译
Modeling the risk of extreme weather events in a changing climate is essential for developing effective adaptation and mitigation strategies. Although the available low-resolution climate models capture different scenarios, accurate risk assessment for mitigation and adaption often demands detail that they typically cannot resolve. Here, we develop a dynamic data-driven downscaling (super-resolution) method that incorporates physics and statistics in a generative framework to learn the fine-scale spatial details of rainfall. Our method transforms coarse-resolution ($0.25^{\circ} \times 0.25^{\circ}$) climate model outputs into high-resolution ($0.01^{\circ} \times 0.01^{\circ}$) rainfall fields while efficaciously quantifying uncertainty. Results indicate that the downscaled rainfall fields closely match observed spatial fields and their risk distributions.
translated by 谷歌翻译
In a typical car-following scenario, target vehicle speed fluctuations act as an external disturbance to the host vehicle and in turn affect its energy consumption. To control a host vehicle in an energy-efficient manner using model predictive control (MPC), and moreover, enhance the performance of an ecological adaptive cruise control (EACC) strategy, forecasting the future velocities of a target vehicle is essential. For this purpose, a deep recurrent neural network-based vehicle speed prediction using long-short term memory (LSTM) and gated recurrent units (GRU) is studied in this work. Besides these, the physics-based constant velocity (CV) and constant acceleration (CA) models are discussed. The sequential time series data for training (e.g. speed trajectories of the target and its preceding vehicles obtained through vehicle-to-vehicle (V2V) communication, road speed limits, traffic light current and future phases collected using vehicle-to-infrastructure (V2I) communication) is gathered from both urban and highway networks created in the microscopic traffic simulator SUMO. The proposed speed prediction models are evaluated for long-term predictions (up to 10 s) of target vehicle future velocities. Moreover, the results revealed that the LSTM-based speed predictor outperformed other models in terms of achieving better prediction accuracy on unseen test datasets, and thereby showcasing better generalization ability. Furthermore, the performance of EACC-equipped host car on the predicted velocities is evaluated, and its energy-saving benefits for different prediction horizons are presented.
translated by 谷歌翻译